109 research outputs found
Improving Structure Evaluation Through Automatic Hierarchy Expansion
Structural segmentation is the task of partitioning a recording into non-overlapping time intervals, and labeling each segment with an identifying marker such as A, B, or verse. Hierarchical structure annotation expands this idea to allow an annotator to segment a song with multiple levels of granularity. While there has been recent progress in developing evaluation criteria for comparing two hierarchical annotations of the same recording, the existing methods have known deficiencies when dealing with inexact label matchings and sequential label repetition. In this article, we investigate methods for automatically enhancing structural annotations by inferring (and expanding) hierarchical information from the segment labels. The proposed method complements existing techniques for comparing hierarchical structural annotations by coarsening or refining labels with variation markers to either collapse similarly labeled segments together, or separate identically labeled segments from each other. Using the multi-level structure annotations provided in the SALAMI dataset, we demonstrate that automatic hierarchy expansion allows structure comparison methods to more accurately assess similarity between annotations
Transfer Learning and Bias Correction with Pre-trained Audio Embeddings
Deep neural network models have become the dominant approach to a large
variety of tasks within music information retrieval (MIR). These models
generally require large amounts of (annotated) training data to achieve high
accuracy. Because not all applications in MIR have sufficient quantities of
training data, it is becoming increasingly common to transfer models across
domains. This approach allows representations derived for one task to be
applied to another, and can result in high accuracy with less stringent
training data requirements for the downstream task. However, the properties of
pre-trained audio embeddings are not fully understood. Specifically, and unlike
traditionally engineered features, the representations extracted from
pre-trained deep networks may embed and propagate biases from the model's
training regime. This work investigates the phenomenon of bias propagation in
the context of pre-trained audio representations for the task of instrument
recognition. We first demonstrate that three different pre-trained
representations (VGGish, OpenL3, and YAMNet) exhibit comparable performance
when constrained to a single dataset, but differ in their ability to generalize
across datasets (OpenMIC and IRMAS). We then investigate dataset identity and
genre distribution as potential sources of bias. Finally, we propose and
evaluate post-processing countermeasures to mitigate the effects of bias, and
improve generalization across datasets.Comment: 7 pages, 3 figures, accepted to the conference of the International
Society for Music Information Retrieval (ISMIR 2023
A Proposal for Foley Sound Synthesis Challenge
"Foley" refers to sound effects that are added to multimedia during
post-production to enhance its perceived acoustic properties, e.g., by
simulating the sounds of footsteps, ambient environmental sounds, or visible
objects on the screen. While foley is traditionally produced by foley artists,
there is increasing interest in automatic or machine-assisted techniques
building upon recent advances in sound synthesis and generative models. To
foster more participation in this growing research area, we propose a challenge
for automatic foley synthesis. Through case studies on successful previous
challenges in audio and machine learning, we set the goals of the proposed
challenge: rigorous, unified, and efficient evaluation of different foley
synthesis systems, with an overarching goal of drawing active participation
from the research community. We outline the details and design considerations
of a foley sound synthesis challenge, including task definition, dataset
requirements, and evaluation criteria
A Hybrid Approach to Music Playlist Continuation Based on Playlist-Song Membership
Automated music playlist continuation is a common task of music recommender
systems, that generally consists in providing a fitting extension to a given
playlist. Collaborative filtering models, that extract abstract patterns from
curated music playlists, tend to provide better playlist continuations than
content-based approaches. However, pure collaborative filtering models have at
least one of the following limitations: (1) they can only extend playlists
profiled at training time; (2) they misrepresent songs that occur in very few
playlists. We introduce a novel hybrid playlist continuation model based on
what we name "playlist-song membership", that is, whether a given playlist and
a given song fit together. The proposed model regards any playlist-song pair
exclusively in terms of feature vectors. In light of this information, and
after having been trained on a collection of labeled playlist-song pairs, the
proposed model decides whether a playlist-song pair fits together or not.
Experimental results on two datasets of curated music playlists show that the
proposed playlist continuation model compares to a state-of-the-art
collaborative filtering model in the ideal situation of extending playlists
profiled at training time and where songs occurred frequently in training
playlists. In contrast to the collaborative filtering model, and as a result of
its general understanding of the playlist-song pairs in terms of feature
vectors, the proposed model is additionally able to (1) extend non-profiled
playlists and (2) recommend songs that occurred seldom or never in
training~playlists
- …